Key Phrase Extraction of Lightly Filtered Broadcast News
نویسندگان
چکیده
This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.
منابع مشابه
Improved Chinese broadcast news transcription by language modeling with temporally consistent training corpora and iterative phrase extraction
In this paper an iterative Chinese new phrase extraction method based on the intra-phrase association and context variation statistics is proposed. A Chinese language model enhancement framework including lexicon expansion is then developed. Extensive experiments for Chinese broadcast news transcription were then performed to explore the achievable improvements with respect to the degree of tem...
متن کاملAutomatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach
The purpose of automatic title generation is to understand a document and to summarize it with only several but readable words or phrases. It is important for browsing and retrieving spoken documents, which may be automatically transcribed, but it will be much more helpful if given the titles indicating the content subjects of the documents. For title generation for Chinese language, additional...
متن کاملConnectionist speech recognition of Broadcast News
This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimati...
متن کاملReal-time rich-content transcription of Chinese broadcast news
This paper describes the recent development of an Audio Indexing System for Chinese (Mandarin) broadcast news. Key issues of the three major components: automatic speech recognition, speaker identification and named entity extraction are addressed. The Chinese-language-specific challenges are discussed and our solutions are described. The recognition accuracy of the final system is comparable t...
متن کاملLarge vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach
Automatic speech recognition of real-live broadcast news (BN) data (Hub-4) has become a challenging research topic in recent years. This paper summarizes our key efforts to build a large vocabulary continuous speech recognition system for the heterogenous BN task without inducing undesired complexity and computational resources. These key efforts included: • automatic segmentation of the audio ...
متن کامل